15 research outputs found
Learning to Search in Reinforcement Learning
In this thesis, we investigate the use of search based algorithms with deep neural
networks to tackle a wide range of problems ranging from board games to video
games and beyond. Drawing inspiration from AlphaGo, the first computer program
to achieve superhuman performance in the game of Go, we developed a new algorithm AlphaZero. AlphaZero is a general reinforcement learning algorithm that
combines deep neural networks with a Monte Carlo Tree search for planning and
learning. Starting completely from scratch, without any prior human knowledge
beyond the basic rules of the game, AlphaZero managed to achieve superhuman
performance in Go, chess and shogi. Subsequently, building upon the success of AlphaZero, we investigated ways to extend our methods to problems in which the rules
are not known or cannot be hand-coded. This line of work led to the development
of MuZero, a model-based reinforcement learning agent that builds a deterministic
internal model of the world and uses it to construct plans in its imagination. We
applied our method to Go, chess, shogi and the classic Atari suite of video-games,
achieving superhuman performance. MuZero is the first RL algorithm to master
a variety of both canonical challenges for high performance planning and visually complex problems using the same principles. Finally, we describe Stochastic
MuZero, a general agent that extends the applicability of MuZero to highly stochastic environments. We show that our method achieves superhuman performance in
stochastic domains such as backgammon and the classic game of 2048 while matching the performance of MuZero in deterministic ones like Go
Playing Atari with Deep Reinforcement Learning
We present the first deep learning model to successfully learn control
policies directly from high-dimensional sensory input using reinforcement
learning. The model is a convolutional neural network, trained with a variant
of Q-learning, whose input is raw pixels and whose output is a value function
estimating future rewards. We apply our method to seven Atari 2600 games from
the Arcade Learning Environment, with no adjustment of the architecture or
learning algorithm. We find that it outperforms all previous approaches on six
of the games and surpasses a human expert on three of them.Comment: NIPS Deep Learning Workshop 201
Leaf age-dependent effects of foliar-sprayed CuZn nanoparticles on photosynthetic efficiency and ROS generation in <i>Arabidopsis thaliana</i>
Young and mature leaves of Arabidopsis thaliana were exposed by foliar spray to 30 mg L−1 of CuZn nanoparticles (NPs). The NPs were synthesized by a microwave-assisted polyol process and characterized by dynamic light scattering (DLS), X-ray diffraction (XRD), and transmission electron microscopy (TEM). CuZn NPs effects in Arabidopsis leaves were evaluated by chlorophyll fluorescence imaging analysis that revealed spatiotemporal heterogeneity of the quantum efficiency of PSII photochemistry (ΦPSΙΙ) and the redox state of the plastoquinone (PQ) pool (qp), measured 30 min, 90 min, 180 min, and 240 min after spraying. Photosystem II (PSII) function in young leaves was observed to be negatively influenced, especially 30 min after spraying, at which point increased H2O2 generation was correlated to the lower oxidized state of the PQ pool. Recovery of young leaves photosynthetic efficiency appeared only after 240 min of NPs spray when also the level of ROS accumulation was similar to control leaves. On the contrary, a beneficial effect on PSII function in mature leaves after 30 min of the CuZn NPs spray was observed, with increased ΦPSΙΙ, an increased electron transport rate (ETR), decreased singlet oxygen (1O2) formation, and H2O2 production at the same level of control leaves.An explanation for this differential response is suggested